spamassassin question about emails used for learning

helga.mayer at uni-hohenheim.de helga.mayer at uni-hohenheim.de
Sun Feb 10 18:23:37 CET 2008





On Sun, 10 Feb 2008, Richard Bos wrote:

> Op zondag 10 februari 2008 12:47, schreef helga.mayer at uni-hohenheim.de:
> > how long should one keep the messages, that are used to teach
> > spamassassin, what is spam and what is ham.  Sofar I have not been able
> > to find this information.  Must the (spam/ham) emails be kept for each
> > spamassassin (sa-learn) run, or can they be removed after e.g. 1 sa-learn
> > run?
>
> There is a database named
>
> bayes_seen
>      A map of message-ID to what that message was learnt as. This is used
> so that SpamAssassin can avoid re-learning a message it has already seen,
> and so it can reverse the training if you later decide that message was
> previously learnt incorrectly.
>
> You don't need to keep the emails used for a sa-learn run.
> You would only need them in case the database gets corrupt and you
> want to rebuild it from scratch - or if you want to reverse the training.

Thanks.  IOW: the messages used to train spamassassin, can be removed directly 
after the sa-learn run.  But it might be wise to keep a copy in case of 
problems with the database.

In my opinion it's only worthwhile saving the messages if you are running 
a small site and if you don't have enough mails (spam and nospam ) every day
to feed spamassassin.

Spam is changing so quickly that there isn't much use of keeping old 
spammails. I've been using Spamassassin for more than 5 years and the
databases never got really corrupted. Occassionaly I deleted the 
spammassassin databases when they produced too many false positives
or when they used too much disk space.
Within a day I've got enough spam and ham to make bayes working again.
I use mainly autolearn.

Let's assume the text below refers to a collection of messages which users 
reported to be spam.

BTW: I found this in the Kolab Documentation (but it is German only)
http://www.kolab.org/doc/Allgemeine-Betriebsdokumentation-KolabServer22_20080103_1.0.pdf:

Nachdem SpamAssassin die ausgefilterten Spam-Mails gelernt hat,
After spamassassin has learnt the filtered spam mails ( filtered by whom 
or what ?? )
können diese E-Mails theoretisch direkt im Anschluss des Lern-Durchlaufs gelöscht werden.
they may be immediately deleted after the sa-learn run.

Ein manuelles Löschen der Mails hat aber den Vorteil, versehentlich als
But if you delete the Mails manually you can restore

unerwünscht deklarierte Nachrichten ins Benutzerpostfach „zurückzuholen“.
messages, which have been reported by mistake, into the user's mailbox.

In case I understood the German text (yes, I doubt. I am German but 
English textbooks are so much easier to understand) and translated it 
fairly correct, this doesn't make much sense.
Users frequently report mails by mistake as spam because they missed the 
'delete' button and clicked 'spam report' instead - of course depending on the
method you offer for spamreports. They won't be happy to get them back.

Mit dem Befehl ipurge lassen sich alle E-Mails löschen, die älter sind als x
You can use the ipurge command to delete all mails which are older than x 
days 
Tage.

Hierbei muss beachtet werden, dass der Befehl nur auf den Spam-Ordner 
Please mind to use the the ipurge command only in the spam folder.
ausgeführt wird.

Eine Hilfe zu den möglichen Parametern ist auf der
You'll find help for the options in the ipurge man page. 
Manual-Seite von ipurge zu finden (man ipurge als Nutzer kolab).
( use 'man ipurge' as user kolab  or use 'man -M /kolab/man ipurge')

Using babelfish to translate the text, was not very helpfull.

Helga Mayer


More information about the users mailing list