Spamassassin bayes issue, maybe


#1

Hi All

More housekeeping. After seeing an unexpected rise in ‘obvious’ spam - not uce but the full-on variety - I’m trying to get a grip on spamassassin.

I may be wrong but it looks like the bayes engine isn’t scoring - e.g., all spamd ‘results’ in var/log/syslog/system include “BAYES_00, … bayes=0.000000,autolearn=no autolearn_force=no”. This also applies to (the few) rejected messages.

Testing an obvious spam offline, as admin, …

… shows a warning;

The test ends;

Content analysis details:   (4.9 points, 4.5 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 0.0 URIBL_BLOCKED          ADMINISTRATOR NOTICE: The query to URIBL was blocked.
                            See
                            http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
                             for more information.
                            [URIs: tuttipertrivero20.it]
 1.6 RCVD_IN_BRBL_LASTEXT   RBL: No description available.
                            [88.212.29.2 listed in bb.barracudacentral.org]
 0.0 HTML_MESSAGE           BODY: HTML included in message
 2.0 PYZOR_CHECK            Listed in Pyzor (http://pyzor.sf.net/)
 1.3 RDNS_NONE              Delivered to internal network by a host with no rDNS

The sharp-eyed may have noticed that I’ve enabled pyzor and lowered the default score from 5 to 4.5.


The psychic know that I’ve swallowed google and taken corrective action to enable dkim functionality;


(A manual lookup suggests that URIBL_BLOCKED is due to multi.uribl.com blocking connections from 5.153.228.34)

The bayes database looks OK at the end of this extract (-D shows dns info and the logger eval fail);

admin@vm1:~$ sa-learn --dump magic -D

Nov  7 17:28:49.877 [17285] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x3bc2420) implements 'learner_new', priority 0
Nov  7 17:28:49.878 [17285] dbg: bayes: learner_new self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x3bc2420), bayes_store_module=Mail::SpamAssassin::BayesStore::DBM
Nov  7 17:28:49.916 [17285] dbg: bayes: learner_new: got store=Mail::SpamAssassin::BayesStore::DBM=HASH(0x4391b28)
Nov  7 17:28:49.917 [17285] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x3bc2420) implements 'learner_is_scan_available', priority 0
Nov  7 17:28:49.918 [17285] dbg: bayes: tie-ing to DB file R/O /srv/.spamassassin/bayes_toks
Nov  7 17:28:49.919 [17285] dbg: bayes: tie-ing to DB file R/O /srv/.spamassassin/bayes_seen
Nov  7 17:28:49.920 [17285] dbg: bayes: found bayes db version 3
plugin: eval failed: Insecure dependency in sprintf while running with -T switch at /usr/share/perl5/Mail/SpamAssassin/Logger.pm line 241.
Nov  7 17:28:49.922 [17285] dbg: config: score set 1 chosen.
Nov  7 17:28:49.925 [17285] dbg: dns: EDNS, UDP payload size 4096
Nov  7 17:28:49.926 [17285] dbg: dns: servers obtained from Net::DNS : [80.68.80.24]:53, [80.68.80.25]:53, [2001:41c8:2::1]:53, [2001:41c8:2::2]:53
Nov  7 17:28:49.926 [17285] dbg: dns: nameservers set to 80.68.80.24, 80.68.80.25, 2001:41c8:2::1, 2001:41c8:2::2
Nov  7 17:28:49.926 [17285] dbg: dns: using socket module: IO::Socket::IP
Nov  7 17:28:49.927 [17285] dbg: dns: is Net::DNS::Resolver available? yes
Nov  7 17:28:49.927 [17285] dbg: dns: Net::DNS version: 0.81
Nov  7 17:28:49.927 [17285] dbg: sa-learn: spamtest initialized
Nov  7 17:28:49.928 [17285] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x3bc2420) implements 'learner_dump_database', priority 0
0.000          0          3          0  non-token data: bayes db version
0.000          0       6729          0  non-token data: nspam
0.000          0     114877          0  non-token data: nham
0.000          0     149479          0  non-token data: ntokens
0.000          0 1477936905          0  non-token data: oldest atime
0.000          0 1478539663          0  non-token data: newest atime
0.000          0 1478536051          0  non-token data: last journal sync atime
0.000          0 1478387014          0  non-token data: last expiry atime
0.000          0     345600          0  non-token data: last expire atime delta
0.000          0      71894          0  non-token data: last expire reduction count
Nov  7 17:28:49.930 [17285] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x3bc2420) implements 'learner_close', priority 0
Nov  7 17:28:49.930 [17285] dbg: bayes: untie-ing

So, first-up, does this look like a problem with bayes? I’m very keen to improve mail handling so any pointers will be super-appreciated.

Cheers,
Martin


Email defences; additional DNS RBL
Adding SpamAssassin report to email
#2

I had the URIBL_BLOCKED issue too. It is because you are using Bytemarks nameservers. The “fix” is to use your own VM as the nameserver for URI_BL. Here is the fix I used (below)


The following should be added to the bottom of /etc/spamassassin/local.cf

urirhssub URIBL_BLACK multi.uribl.com. A 2
body URIBL_BLACK eval:check_uridnsbl(‘URIBL_BLACK’)
describe URIBL_BLACK Contains an URL listed in the URIBL blacklist
tflags URIBL_BLACK net
score URIBL_BLACK 3.0

urirhssub URIBL_GREY multi.uribl.com. A 4
body URIBL_GREY eval:check_uridnsbl(‘URIBL_GREY’)
describe URIBL_GREY Contains an URL listed in the URIBL greylist
tflags URIBL_GREY net
score URIBL_GREY 0.25

Install BIND

apt-get install bind9

Next edit nano /etc/bind/named.conf.options

and make it look like this (the IP addresses below are BYTEMARK’s DNS servers)

acl goodclients {
localhost;
localnets;
};

options {
directory “/var/cache/bind”;

    recursion yes;
    allow-query { goodclients; };
    forwarders {
            80.68.80.24;
            80.68.80.25;
            80.68.80.26;
            80.68.80.27;
            80.68.80.46;
    };
    forward only;
    dnssec-enable yes;
    dnssec-validation yes;
    auth-nxdomain no;    # conform to RFC1035
    listen-on-v6 { any; };

};

Next nano /etc/bind/named.conf.local

//
// Do any local configuration here
//

// Consider adding the 1918 zones here, if they are not used in your
// organization
//include “/etc/bind/zones.rfc1918”;

zone “multi.uribl.com” {
type forward;
forward first;
forwarders {};

};

The above changes can be tested with running named-checkconf on a command line. If everything is OK then there is no output. Only errors show up

When done

service bind9 restart

Testing the BIND link to the Blacklist server

Test the setup by using the command

host -tTXT 2.0.0.127.multi.uribl.com

If this returns a URI_BLOCKED error then edit /etc/default/bind9 so it looks like this

run resolvconf?

RESOLVCONF=yes

startup options for the server

OPTIONS="-u bind"

Then check if /etc/resolv.conf is a symbolic link or not. You may need to do this

mv /etc/resolv.conf /etc/resolv.conf.orig
ln -s /run/resolvconf/resolv.conf /etc/resolv.conf

/run/resolvconf/resolv.conf should look like this

nameserver 127.0.0.1


#3

Hi gembix1

Thanks; looks great and step-by-step is always a bonus! It could be doubly useful because I’ve started using additional rbl & rhsbl to take the load off spamassassin & clamav. (bad.psky.me=127.0.0.2, barracuda and excommunicado.co.uk are proving most useful but I’ll post separately about this once I’ve tested a more generic syntax.)

Would it be sensible to add “2001:41c8:2::1, 2001:41c8:2::2” here (these are taken from the debug output)?

Meanwhile, BAYES_00 remains omnipresent but autolearn ham occurs occasionally (possibly with spam). There was a score of 17 yesterday, which seems remarkable given the bayes contribution and default scoring - the only adjustment has been to the spam threshold.

Cheers,
Martin


#4

Hi Martin

It cannot hurt to add the IPv6 addresses. Since all my setups are still in IPv4 I just never bothered thinking about it.

I also added a bunch of extra rules to Spamassassin which get applied to all mails. Knowing the sorts of mail that passes through the machines I was able to take some liberties and make some assumptions that are probably not justified in a more general situation, but the users seem to like it. So applying regular expressions looking for string in the header like ‘sxx’ or ‘s*x’ or pu(s$|$s|$$)y or f[$@&#5%]ck - all common strings seen in headers

I also add a score of 2.0 for any emails from .xyz .cn .ru .mx .hu .club .load .party or .link

Kind regards

Beverley


#5

Thanks, Beverley; I’ll give bind9 a spin tonight or over the weekend.


#6

Hi Martin

Good luck with it all. Just one point - my BigV is a bare machine running Ubuntu LTS, not Symbiosis. Hopefully the config files are in much the same locations

Kind regards

Beverley


#7

OK, thanks. If anything doesn’t match or if any :B people advise against (in time), I’ll back off.

Cheers,
Martin