Spam/Ham
Jeroen van Meeuwen (Kolab Systems)
vanmeeuwen at kolabsys.com
Mon Dec 31 12:23:02 CET 2012
On 2012-12-30 01:42, Stefan Froehlich wrote:
> Personally I don't like the way to handle spam and ham described in
> the
> documentation. No user is willing to send real spam to the spam folder
> and real ham to the ham folder.
Actually, a "Mark as Junk" button is available in Roundcube (enable the
"markasjunk" plugin), and this button will move a message to a
(previously defined, user configurable, "Spam" folder by default) junk
folder.
That said, the documentation so far has been written from an
administrator's perspective - the required user interaction has not been
thoroughly examined and as such the flow for nor expectations of the
user have necessarily been subjected to the right amount of scrutiny.
> I'd like to implement another approach which is easier for the user.
> I'd
> like to learn all mails in the user's spam folder as spam and all
> mails
> not in the spam folder as ham. I think this is the natural intuitive
> way
> for a user.
I would love to consider what is a more intuitive work-flow for a
regular user, but what I've found to be possible with the pieces in
place tends to lead to needing to compromise something somewhere;
- One could set delete_mode and expunge_mode back to 'immediate', but
this compromises the ability to ensure all message(-file)s ever in any
mailbox are also included in at least one backup.
- One could maintain delete_mode and expunge_mode set to 'delayed' and
only learn spam and ham after;
1) a ((virtual?) full?) backup has completed,
2) message files for expunged messages are deleted from the
filesystem (by running cyr_expire -D 0s -X 0s -E 3?)
2a) I've found that it is not possible to run, for example,
cyr_expire -E 3 -X 0s user/john.doe/Spam at example.org, which may be a bug
in the software but is the status quo nonetheless.
2b) I've found that it is actually particularly hard to, in real
life, recognize which folder a user believes contains or is to contain
the messages that that user believes are indeed real spam, unless
semi-strict defaults are offered and only a limited number of options
are offered to change the spam folder; My point is localization,
recognizing the different names a user may give (in any locale),
capitalization and case-sensitivity. "The problem is choice", if you
will - though we have an annotation '/vendor/kolab/folder-type =
mail.junkemail'. Users will press "Delete" for spam messages and press
"Mark as Junk" for newsletters they themselves have subscribed
themselves to.
> I created a sieve filter which moves all mails tagged as X-Spam-Flag:
> YES to the user's spam folder. Also if a user sees a message in the
> spam
> folder which he thinks is not spam he simply moves it out from there
> to
> the inbox (or a subfolder).
With later versions of Kolab, we'll have a feature that is called
"Sieve Script Management" - an administrator can then specify a set of
MUST-HAVE rules for a user, under KEP #14[1].
This will allow an administrator to make sure the user's sieve scripts
are preceeded by a "managed" segment; that may contain, for example, a
'fileinto "Spam";' action.
> All these messages should be learned as ham.
> I started writing a bash script handle all these things. The idea is
> (to
> increase speed) to insert another X- Tag into the mail, let's say
> X-<ServerName>-Learned-As: and the possible values are spam or ham. I
> introduced this for performance reasons.
I would recommend *not* changing the contents of the email (on the
filesystem or otherwise, in fact);
- Learning spam / ham will remember the tokens learned and as such an
email will not be "learned" twice.
- Using the filesystem ctime/mtime for a message is, I think, a more
appropriate approach (i.e. "only learn messages that are 'new' since $x
days"),
- After Spam / Ham is learned, the folder can be pruned from contents
using /usr/lib/cyrus-imapd/ipurge
> Now I ran into several problems (All is on Debian Wheezy):
>
> 1) A folder does not necessarily contains only valid undeleted mail
> files. Let's say a user moves some mails out of the spam folder the
> mail
> files are still in the spam folder. I can't see a way how to
> distinguish
> between real mail files and those that have been deleted already but
> not
> deleted from the filesystem yet.
>
This problem is two-fold;
1) a client application only needs to flag the messages as \Deleted,
but does not need to issue an EXPUNGE to the folder, and
2) Individual message files (that correspond to messages previously
flagged as \Deleted and in folders on which an EXPUNGE has indeed been
issued) are not immediately deleted from the filesystem.
The way that I myself (therefore) learn Spam (and Ham) is to first
learn Spam, and on top of that learn Ham; My understanding is that when
the same message(s, -tokens) have first been learned as Spam, but are
then learned as Ham, it should forget what it had learned and learn the
message as Ham.
A resulting instruction to the user is then also, to move/copy messages
from the Spam folder that are not actually spam to the Ham folder (aside
from, perhaps, also copying the message back to an INBOX or any other
folder).
> 2) If I change a file in on filesystem level how can I let cyrus know
> so
> that it is aware of this change?
>
I would recommend against changing the files on the filesystem, as the
only way to let Cyrus IMAP pick them up is to reconstruct the folder -
this would also re-activate (or re-insert, if you will) the messages in
the spool that have previously been expunged.
> 3) I somehow corrupted my spam folder. The standard installation of
> kolab doesn't install the reconstruct binary so I was unable to
> recover
> this folder. Where do I find the reconstruct binary?
>
This utility should normally be shipped (on Debian Wheezy) as
/usr/lib/cyrus-imapd/reconstruct. If it's not installed as part of the
package(s), I would love to see a ticket on issues.kolab.org for it.
> 4) To increase performance I'd like to react on user's move request
> instead of scanning all mail folders. Is there a way to run a script
> if
> a user is about to move a message or if a user just has moved a
> message?
>
There's a notify socket one could listen on in order to pick up
notifications on events (such as a new delivery, or any other change).
Kind regards,
Jeroen van Meeuwen
[1] https://wiki.kolab.org/KEP:14
--
Systems Architect, Kolab Systems AG
e: vanmeeuwen at kolabsys.com
m: +44 74 2516 3817
w: http://www.kolabsys.com
pgp: 9342 BF08
More information about the users
mailing list